CANDI: A Web Server for Predicting Molecular Targets and Pathways of Cannabis-Based Therapeutics

Background: Cannabis sativa with a rich history of traditional medicinal use, has garnered significant attention in contemporary research for its potential therapeutic applications in various human diseases, including pain, inflammation, cancer, and osteoarthritis. However, the specific molecular targets and mechanisms underlying the synergistic effects of its diverse phytochemical constituents remain elusive. Understanding these mechanisms is crucial for developing targeted, effective cannabis-based therapies. Methods: To investigate the molecular targets and pathways involved in the synergistic effects of cannabis compounds, we utilized DRIFT, a deep learning model that leverages attention-based neural networks to predict compound-target interactions. We considered both whole plant extracts and specific plant-based formulations. Predicted targets were then mapped to the Reactome pathway database to identify the biological processes affected. To facilitate the prediction of molecular targets and associated pathways for any user-specified cannabis formulation, we developed CANDI (Cannabis-derived compound Analysis and Network Discovery Interface), a web-based server. This platform offers a user-friendly interface for researchers and drug developers to explore the therapeutic potential of cannabis compounds. Results: Our analysis using DRIFT and CANDI successfully identified numerous molecular targets of cannabis compounds, many of which are involved in pathways relevant to pain, inflammation, cancer, and other diseases. The CANDI server enables researchers to predict the molecular targets and affected pathways for any specific cannabis formulation, providing valuable insights for developing targeted therapies. Conclusions: By combining computational approaches with knowledge of traditional cannabis use, we have developed the CANDI server, a tool that allows us to harness the therapeutic potential of cannabis compounds for the effective treatment of various disorders. By bridging traditional pharmaceutical development with cannabis-based medicine, we propose a novel approach for botanical-based treatment modalities.


Introduction
Cannabis sativa is among the most ancient cultivated plants, with evidence suggesting its utilization may date back nearly a million years (Ren et al., 2021).Its multifaceted advantages, particularly as a source of ber, have resulted in its extensive function in both agricultural and industrial applications (Fordjour et al., 2023; H.-L. Li, 1974).Currently, cannabis is consumed for medicinal and recreational purposes and is recognized for its various derived metabolites, including terpenoids, avonoids, sterols, and phytocannabinoids (Simiyu et al., 2022).Phytocannabinoid compounds are being comprehensively reviewed and are stated to interrelate with a complex network of receptors and signaling pathways that play a crucial role in modulating various physiological processes, including pain perception, appetite, mood, and memory (Bonn-Miller et al., 2018;Pacher et al., 2006;Zou & Kumar, 2018).The principal psychoactive constituent of cannabis, Δ9-tetrahydrocannabinol (THC), has been the focus of wide-ranging investigation and is the only approved cannabinoid-based prescription for the healing of chemotherapy-induced sickness in patients (Badowski, 2017;Ng et al., 2024).However, the therapeutic potential of cannabis extends far beyond THC, with numerous other cannabinoids and terpenes exhibiting promising pharmacological activities (Alves et al., 2020).
On the other hand, cannabis, a formerly banned substance universally, has endured a substantial shift in perception, with various countries like the United States and Canada acknowledging its long-standing traditional medicinal use and legalizing its usage.This paradigm shift has been driven by scienti c research and an emerging understanding of the potential therapeutic bene ts of cannabis and its active compounds (Dalli et al., n.d.).Modern computational and experimental studies on phytocannabinoids and other cannabis-derived compounds have elucidated their medicinal value in the treatment of diverse human disorders, including in ammatory bowel disease (IBD), cancer, Alzheimer's disease, Parkinson Varshney et al., 2023).Consequently, the integration of cannabis-based therapeutics into conventional medical practice continues to expand, offering new treatment avenues and improved outcomes for patients with these debilitating conditions (Scherma et al., 2020).Hence, the historical signi cance and value of cannabis have further emphasized the importance of cannabis-based drug discovery, driving advancements in our understanding of its therapeutic potential and facilitating its integration into modern medical practice.
Naturally occurring chemical compounds from various sources are vital in diverse biological activities and are at the forefront of drug discovery studies.However, identifying the targets for these compounds remains a bottleneck in understanding their mechanisms of action(G.Li et al., 2021;Newman & Cragg, 2016).Experimental techniques, such as a nity chromatography, protein microarrays, and genomic or proteomic studies, are typically employed for target identi cation, but they are highly time-consuming and relatively expensive (Cheng et al., 2011;Zhang et al., 2022).In contrast to the traditional drug development strategies, it is widely known that compounds often interact with multiple targets, presenting a potential limitation for the experimental approaches(G.Li et al., 2021).Computational methods offer an alternative by employing various algorithms to identify targets for compounds.For instance, models, such as network-based approaches, data mining, and machine learning, have been used to predict targets for compounds (Agamah et al., 2019;Ezzat et al., 2019;Nogueira & Koch, 2019).
Moreover, the recent development of deep learning networks has expanded the scope and improved the predictability of target identi cation from various biological databases that have grown enormously with abundant data on protein-ligand complexes.Deep learning models can effectively analyze large datasets and complex biological networks, making them increasingly valuable in modern drug target identi cation (Askr et al., 2023;Chen et al., 2024;Zeng et al., n.d.;Zhou et al., 2023).DRIFT is one such model that helps map the targets for the compounds using deep learning approaches by integrating neural network architecture to predict the target-compound binding a nity using the Yuel algorithm in the backend (Chirasani et al., 2022;Wang & Dokholyan, 2022).Hence, advancing computational methodologies for discerning compound-target correlations and extrapolating potential targets for pharmaceuticals and bioactive substances through amalgamating and integrating critical target data from myriad sources provides a valid approach to understanding the context of compound-target interactions.Furthermore, pathways play a pivotal role in elucidating the intricate nature of various diseases, as proteins function within complex networks of interactions (Liu & Chance, 2013).Complex diseases often arise from the dysregulation of multiple targets within interconnected pathways or variations in different genes within the same pathways across diverse patient populations(Y.-A.Kim et al., 2011).Hence, elucidating the relationship between targets and disease-associated pathways is crucial for comprehending disease mechanisms and holds promise for developing e cacious treatments.
Despite signi cant advances, several critical knowledge gaps persist in our understanding of cannabis pharmacology and its therapeutic potential.There remains a need for further exploration into their mechanisms of action, e cacy, and safety pro les.Furthermore, the variability in cannabis strains, lack of standardized formulations, and potential adverse effects associated with long-term use pose signi cant challenges to the development of cannabis-based therapeutics.Addressing these gaps is imperative for advancing our understanding of cannabis pharmacology and translating it into safe and effective treatments for a wide range of disorders.In light of these considerations, we have focused the study on cannabis-based drug discovery, which aims to harness the synergistic effects of the plant's diverse phytochemical constituents, a phenomenon known as the "entourage effect."This strategy recognizes that the therapeutic e cacy of cannabis may not be solely attributable to a single compound but rather to the intricate interplay between multiple cannabinoids, terpenes, and other compounds present in the plant (Ferber et al., 2020;Koltai & Namdar, 2020).Accruing data from numerous studies suggests that cannabis extracts or combinations of cannabis-derived compounds may elicit synergistic effects in alleviating pain, reducing in ammation, and mitigating the psychoactive effects (Anand et  ).Hence, utilizing computational algorithms, we aim to shed light on the intricate interplay between cannabinoids, terpenes, and other compounds, with the ultimate goal of contributing to the development of novel and e cacious cannabis-based therapeutics.Therefore, leveraging computational algorithms, we seek to elucidate the complex synergistic interactions between cannabinoids, terpenes, and other bioactive constituents within the cannabis plant.This multi-faceted approach aims to identify potential therapeutic targets, optimize drug formulations, and ultimately contribute to the development of innovative and effective cannabis-based therapies for a wide range of medical conditions.Furthermore, we have developed a user-friendly web interface (CANDI, http://candi.dokhlab.org)to facilitate the prediction of targets and relevant pathways for cannabis compounds and formulations, thereby streamlining the drug discovery process and enhancing accessibility for researchers and clinicians alike.Hence, our study contributes to the advancement of drug discovery efforts aimed at harnessing the therapeutic potential of cannabis compounds for the effective treatment of various disorders.

Materials and Methods
Data Curation and Compilation.The initial dataset comprising compounds sourced from the cannabis plant was curated from Pennsylvania state-approved keystone state testing -cannabis laboratory(Raup-Konsavage et al., 2020).These compounds were systematically classi ed into three main categories: cannabinoids, terpenes, and avonoids.In total, 73 compounds, with 16 falling under cannabinoids, 39 under terpenes, and 18 under avonoids, as detailed in Table 1.Target prediction.Targets for the cannabis compounds were initially determined using the DRIFT algorithm, with the SMILES notation serving as the input format.Subsequently, the obtained targets were re ned to include only protein-related factors.The resultant sorted targets and their respective scores were then utilized for subsequent analyses.Further, the results were structured in a matrix (C, T), where C represents compounds and T represents targets, each with corresponding predicted scores.Subsequently, for each compound, the user-provided formulation was incorporated as weights (W j ), which were then multiplied with the corresponding scores in the matrix (MScore).The resulting values were summed over all targets (j = 1 to n), yielding a nal score for each compound.
This computation yielded the nalized results, presented in concatenated form, which were subsequently sorted according to normalization criteria.Ultimately, the targets associated with the user-provided formulation for the set of compounds were obtained, along with their normalized scores.
Pathway Mapping.We undertook a systematic curation process to map the pathways associated with the identi ed targets utilizing data from the REACTOME database (Milacic et al., 2024).Initially, the mapping of UniProt identi ers to pathways was facilitated through an in-house Python script.Subsequently, the UniProt identi ers and their corresponding normalized scores derived from the target analysis were employed as input for pathway prediction.Notably, these scores were utilized as weights during the prediction process.The mapping procedure involved querying the REACTOME database to retrieve pathways associated with the identi ed UniProt entries.The retrieved pathways were concatenated, forming a comprehensive list.To rank the pathways, we utilized pathway scores.To compute the pathway scores, the weights of the UniProt identi ers mapped to each pathway were aggregated and divided by the total number of UniProt identi ers provided as input.
, where MW, NM, and NT represent mapped target weights, the number of mapped targets, and the number of total targets, correspondingly.
This systematic approach ensured the accurate prediction of pathways associated with the identi ed targets, enhancing our understanding of the biological processes in uenced by the investigated compounds.
Compound-Target-Pathway Similarity Analysis.To assess the relationship between the compounds and targets, we have utilized the DRIFT predictions on the cannabis compounds to establish an indirect relationship between them.We leveraged the target information and scores to generate vector representations for each compound.These vectors served as the basis for computing cosine similarity scores, enabling the quanti cation of compound-target relationships.
where A and B represent the vectors corresponding to two compounds.The computed similarity scores were visualized as a heatmap using the Matplotlib library in Python.This visualization method provided users with an intuitive means to comprehend the degree of similarity between compounds and their associated targets.The approach allowed for a clear representation of the complex relationships within the dataset, enabling researchers to rapidly identify patterns and potential areas of interest.We extended our analysis by curating pathways associated with the compounds using the Reactome database, a comprehensive open-source database of human biological processes.This additional step allowed us to map the similarity between compounds and their related pathways, providing a more holistic view of the potential biological impacts of these substances.
Construction of CANDI Web Interface.We developed a user-friendly web interface using Flask, HTML5, CSS, and JavaScript.HTML5 was utilized to structure the content of the web pages, while CSS3 was employed for styling and layout customization.JavaScript was integrated to enhance user interactivity and functionality, ensuring a seamless browsing experience.Python Flask was used to handle data retrieval and processing tasks for back-end development.The compatibility of CANDI was tested across popular web browsers such as Chrome and Firefox to ensure consistent performance and rendering.
Overall, CANDI provides users with an intuitive and versatile platform for accessing and analyzing cannabis compound data.
CANDI Web Interface.CANDI offers a suite of interactive modules, each tailored to address distinct stages of cannabis-based drug discovery (Fig. 1A) Compound Search: This module serves as a comprehensive repository of information on individual cannabis-derived compounds (Fig. 1B).Users may search for speci c compounds using various formats, including generic names, SMILES strings, and PubChem IDs.Upon searching, users can access detailed data, including the function to download the results in table format.
Predicted Molecular Targets: A curated list of proteins or receptors likely to interact with the compound is provided based on experimental evidence and computational predictions with corresponding predicted scores.The interface also includes a bar plot to represent the targets and their scores visually.
Similarity Search Results: A list of structurally similar compounds and similarity scores calculated using the FP2 ngerprint and SMILES strings are provided to explore potential analogs with enhanced or altered pharmacological pro les.
Assay Data: When available, results from relevant biological assays are presented, offering insights into the compound's potency with a value alongside the assay method.
Formulation: Recognizing the importance of the entourage effect, this module allows users to input a speci c formulation of multiple cannabis compounds.CANDI then leverages its underlying algorithms to e ciently predict the target and map its relevant pathways (Fig. 1C).Potential molecular targets that the speci c combination of compounds in the formulation may uniquely or preferentially modulate are identi ed.The associated biological pathways likely to be impacted by the formulation are mapped highlighting potential therapeutic applications.
Compound-target-pathway Similarity: This module facilitates target-based drug discovery by enabling users to identify novel cannabis compounds based on their relationship to speci c targets and pathways (Fig. 1D).Users can input a set of cannabis compounds, and CANDI employs a cosine similarity algorithm to assess the similarity between the input compounds, known targets and pathways.This analysis identi es cannabis compounds predicted to interact with similar and potentially distinct targets and pathways.This method enhances our understanding of individual compounds and illuminates the complex network of interactions within biological systems.By providing a holistic view of the relationships between compounds and their targets, this feature aids in discerning combinations of compounds that may synergistically modulate multiple targets within a given pathway.

Results and Discussion
Cannabis sativa exhibits promising therapeutic potential, substantiated by accumulating scienti c evidence.However, the development of standardized cannabis-based therapeutics is hampered by challenges inherent to the plant's phytochemical complexity.We employ a deep learning computational approach to predict molecular targets and associated pathways for cannabis formulations, elucidating the synergistic effect.This research is facilitated by CANDI, a user-friendly web server designed to analyze compound-target interactions and therapeutic mechanisms comprehensively.
Architecture of CANDI.The CANDI web server is an integrated computational platform designed to facilitate the identi cation of molecular targets and associated pathways for user-speci ed formulations of cannabis-derived compounds (Fig. 2).The work ow is instigated by user input, wherein the speci c combination and concentrations of cannabinoids, terpenes, and other relevant molecules of interest are de ned.Leveraging the DRIFT algorithm (Chirasani et al., 2022), a deep learning model trained on structural and chemical properties, the platform predicts potential targets for the compounds.It assigns scores based on the likelihood of interaction.These scores are normalized and re-ranked, considering the user-speci ed formulation composition and concentrations.The ranked targets are mapped to their corresponding UniProt identi ers(The UniProt Consortium, 2023), enabling the identi cation of relevant pathways within the Reactome database (Milacic et al., 2024), a comprehensive resource of biological pathways and processes.The nal output provided by CANDI is a ranked list of pathways, weighted by the number and scores of associated targets, offering insights into the potential mechanisms underlying the therapeutic effects of the speci ed cannabis formulation.This integrated computational approach enables researchers to systematically explore the intricate interplay between cannabis compounds and their molecular targets, accelerating the development of targeted therapies and elucidating the mechanistic underpinnings of cannabis-based therapeutics.
Case Study on Cannabis Oil Formulation.To validate CANDI's functionality, performed studies utilizing a commercial cannabis oil formulation comprising various composition of cannabinoids and terpenes (Table 2).The formulation's composition, obtained from experimental data, was input into CANDI by modifying the platform's sample CSV le.Upon analysis, CANDI generated results that were presented in two sections: predicted molecular targets and associated pathways.The predicted targets section displayed a ranked list, ordered by their predicted interaction scores, with the highest-scoring targets listed rst.For further reference, each target was linked to its corresponding UniProt entry.The analysis revealed that the formulation was predicted to interact with cannabinoid receptors CB1 and CB2, followed by G protein-coupled receptor 55 (GPR55), cytochrome P450 enzymes, and other receptors (Fig. 3A).The associated pathways section provided a detailed overview of the Reactome pathways linked to the predicted targets.These pathways were ranked based on their predictive score.Among the identi ed pathways were nuclear receptor transcription, G alpha(i) signaling events, the release of apoptotic factors from mitochondria, and SUMOylation of intracellular receptors all implicated in various physiological processes (Fig. 3B).Hence, the analysis revealed that this formulation could modulate multiple targets and pathways associated with pain management, in ammation, and neurological disorders.The formulation was predicted to interact with the endocannabinoid system, including the CB1 and CB2 receptors.These interactions could contribute to the formulation's potential analgesic, anti-in ammatory, and neuroprotective effects (Donvito et al., 2018;Gonzalo-Consuegra et al., 2024).Furthermore, the analysis identi ed several relevant pathways related to pain perception and in ammation (Che, 2021; Zhao et al., 2020).Hence, CANDI-generated hypothesis is that this formulation shows promise as a potential therapeutic agent for these conditions.Further research, including preclinical and clinical studies, is warranted to validate these ndings and explore the full therapeutic potential of this formulation.
Case Study on Cannabinoids.In the second case study, we analyzed a cannabis oil formulation containing only cannabinoids.From the analyses we could decipher that the formulation was predicted to interact with cannabinoid receptors CB1 and CB2, DNA polymerase kappa, and G protein-coupled receptor 55 (GPR55), vitamin receptor, and other receptors (Supp.Figure 1A).The associated pathways were G alpha(i) signaling events, Interlukin-4 and Interlukin-13 signaling, and neutrophil degranulation, entirely associated in several biological activities (Supp.Figure 2B).Notably, this formulation was predicted to interact with the well-characterized CB1 and CB2 receptors, which are primary targets in cannabinoid research.These receptors are involved in various physiological processes, including pain modulation and in ammatory responses(Raup-Konsavage et al., 2023; Turcotte et al., 2016).In accord with our ndings, the mapped pathways, particularly G alpha(i) signaling events and Interleukin-4 and Interleukin-13 signaling, have been implicated in pain perception and in ammatory processes (Ibsen et al., 2017;Oláh et al., 2017).
Case Study on Terpenes.In the third case study, a formulation containing only terpenes was analyzed.
Predicted targets and associated pathways were charted.The formulation was predicted to interact with solute carrier organic anion transporter family members 1B1 and 1B3, bile acid receptor FXR, arachidonate 15-lipoxygenase receptors and also cannabinoid CB2 receptor (Supp.Figure 2A).Associated pathways included nuclear receptor transcription, aspirin ADME, SUMOylation of intracellular receptors, Interleukin-4 and Interleukin-13 signaling, and G alpha (i) signaling events all implicated in various biological processes (Supp.Figure 2B).These ndings suggest that this terpene formulation may contribute to modulating diverse physiological functions through its interactions with these targets and pathways.In accord with our ndings, the identi ed targets and pathways are commonly involved in in ammatory bowel disorders, various in ammatory diseases, and metabolic disorders(Del Prado-

Conclusion
The development of cannabis-based therapeutics holds signi cant potential for treating diverse medical conditions.However, this potential is constrained by the intricacy of the cannabis plant and the current lack of standardized, targeted therapies.The study exempli es a noteworthy improvement in overcoming these challenges by leveraging computational approaches, speci cally deep learning algorithms.CANDI facilitates the identi cation of molecular targets and associated pathways for speci c combinations of cannabis-derived compounds, addressing research gaps related to the entourage effect.Additionally, the user-friendly interface allows researchers to investigate the complex interplay between these compounds and their potential therapeutic targets.By integrating information on compound-target interactions and relevant biological pathways, CANDI facilitates a comprehensive analysis of the molecular mechanisms underlying the therapeutic effects of cannabis formulations and offers a plausible hypothesis on health outcomes of such compounds and formulations.Hence, the study contributes to the advancement of drug discovery efforts aimed at harnessing the therapeutic potential of cannabis compounds for the effective treatment of various disorders.

Declarations
Ethical Approval Figures The overall functioning of the CANDI webserver.(A) Front page of the CANDI user interface for the compounds, formulation and similarity functions.(B) The compound information data could be obtained for all 73 cannabis compounds using common names, SMILES, and PubChem IDs.By browsing the compound information, users can intuitively obtain targets for the compound with a predicted score, pharmacophore-similar compounds with similarity values, assay value and type, and graphical representation of the targets vs. score as a bar plot.(C) The Formulation page lets users download the le to add user values and upload the le for the target and pathway prediction.(i) Predicted targets are ranked according to the score and linked to their corresponding uniport entries.(ii) Pathways were mapped for the predicted targets from the Reactome database, and the pathway score was shown as an interactive table.(D) The compound-target-pathway similarly page allows the user to provide input for the cannabis compounds to identify the relationship between the compound-target and compoundpathway.The output is a heatmap with the download option for the image le and data in CSV format.

Table 1 A
curated dataset of 73 compounds extracted from Cannabis sativa through experimental studies.This dataset encompassed 16 cannabinoids, 39 terpenes, and 18 avonoids.

Table 2
Dataset of the cannabis compounds with formulation used for the case study